Difference Between Matplotlib VS Seaborn

Data Visualization is the graphic representation of data. It converts a huge dataset into small graphs, thus aids in data analysis and predictions. It is an indispensable element of data science which makes complex data more understandable and accessible. Matplotlib and Seaborn act as the backbone of data visualization through Python.

Matplotlib: It is a Python library used for plotting graphs with the help of other libraries like Numpy and Pandas. It is used for creating statical interferences and plotting 2D graphs of arrays.

Seaborn: It is also a Python library used for plotting graphs with the help of Matplotlib, Pandas, and Numpy. It is built on the roof of Matplotlib and is considered as a superset of the Matplotlib library. It helps in visualizing univariate and bivariate data. It uses beautiful themes for decorating Matplotlib graphics. It acts as an important tool in picturing Linear Regression Models. It serves in making graphs of statical Time-Series data. It eliminates the overlapping of graphs and also aids in their beautification.

1. Seaborn Scatter Plot Using Sns.Scatterplot()

If, you have x and y numeric or one of them a categorical dataset. You want to find the relationship between x and y to getting insights. Then the seaborn scatter plot function sns.scatterplot() will help.

Along with sns.scatterplot() function, seaborn have multiple functions like sns.lmplot(), sns.relplot(), sns.pariplot(). But sns.scatterplot() is the best way to create sns scatter plot.

Syntax: sns.scatterplot( x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’, x_jitter=None, y_jitter=None, legend=’brief’, ax=None, kwargs,** )

In [44]:
# Import libraries
import seaborn as sns # for Data visualization
from scipy.stats import norm # for scientific Computing
import matplotlib.pyplot as plt # for Data visualization
 
#It used only for read_csv in this tutorial
import pandas as pd # for data analysis
import numpy as np

1. Seaborn Line Plot

If you have two numeric variable datasets and worry about what relationship between them. Then Python seaborn line plot function will help to find it. Seaborn library provides sns.lineplot() function to draw a line graph of two numeric variables like x and y.

Syntax: sns.lineplot( x=None, y=None, hue=None, size=None, style=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, dashes=True, markers=None, style_order=None, units=None, estimator=’mean’, ci=95, n_boot=1000, sort=True, err_style=’band’, err_kws=None, legend=’brief’, ax=None, kwargs, )**

In [14]:
#Import dataset from GitHub Seborn Repository
tips_df = sns.load_dataset("tips")
tips_df
Out[14]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
... ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2

244 rows × 7 columns

In [8]:
# Firstly let's take some dictionary values
days = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
temperature = [36.6, 37, 37.7,39,40.1,43,43.4,45,45.6,40.1,44,45,46.8,47,47.8]

#create dataframe using two list days and temperature
temp_df = pd.DataFrame({"days":days, "temperature":temperature})
 
# Draw line plot
sns.lineplot(x = "days", y = "temperature", data=temp_df,)
plt.show() # to show graph
In [15]:
# Let's take data from the imported dataset
sns.lineplot(x = "total_bill", y = "tip", data = tips_df )
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cfffd3e648>
In [16]:
# Draw line plot of tip and size
sns.lineplot(x = "tip", y = "size", data = tips_df)
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cfffd954c8>

Seaborn Line Plot with Multiple Parameters

Till now, drawn multiple line plot using x, y and data parameters. Now, we are using multiple parameres and see the amazing output.

hue => Get separate line plots for the third categorical variable. In the above graph draw relationship between size (x-axis) and total-bill (y-axis). Now, plotting separate line plots for Female and Male category of variable sex.

style => Give style to line plot, like dashes. Different for each line plot.

palette => Give colormap for graph. You can choose anyone from bellow which is separated by a comma.

dashes => If line plot with dashes then use “False” value for no dashes otherwise “True“.

markers => Give the markers for point like (x1,y1). for markers follow matplotlib line plot blog.

legend => Give legend. The default value is “brief” but you can give “full” or “False“. False for no legend.

In [22]:
# Draw line plot of size and total_bill with parameters
sns.lineplot(x = "size", y="total_bill", data=tips_df, hue="sex",
            style = "sex", palette = "hot", dashes = False, 
            markers = ["o", "<"],legend="brief",)


plt.title("Line Plot", fontsize = 20) # for title
plt.xlabel("Size", fontsize = 15) # label for x-axis
plt.ylabel("Total Bill", fontsize = 15) # label for y-axis
plt.show()

Seaborn set style and figure size

Above, the line plot shows small and its background white but you cand change it using plt.figure() and sns.set() function.

In [23]:
plt.figure(figsize = (16,9)) # figure size with ratio 16:9
sns.set(style='darkgrid',) # background darkgrid style of graph 
 
# Draw line plot of size and total_bill with parameters
sns.lineplot(x = "size", y = "total_bill", data = tips_df, hue = "sex",
            style = "sex", palette = "hot", dashes = False, 
            markers = ["o", "<"],  legend="brief",)
 
plt.title("Line Plot", fontsize = 20)
plt.xlabel("Size", fontsize = 15)
plt.ylabel("Total Bill", fontsize = 15)
plt.show()

Using sns.lineplot() hue parameter, we can draw multiple line plot. In the above graphs drawn two line plots in a single graph (Female and Male) same way here use day categorical variable. Which have total 4-day categories?

In [24]:
plt.figure(figsize = (16,9))
sns.set(style='darkgrid',)
 
# Draw line plot of size and total_bill with parameters and hue "day"
sns.lineplot(x = "size", y = "total_bill", data = tips_df, hue = "day",
            style = "day", palette = "hot", dashes = False, 
            markers = ["o", "<", ">", "^"],  legend="brief",)
 
plt.title("Line Plot", fontsize = 20)
plt.xlabel("Size", fontsize = 15)
plt.ylabel("Total Bill", fontsize = 15)
plt.show()

2. Seaborn Histogram Using Sns.Distplot()

If you have numeric type dataset and want to visualize in histogram then the seaborn histogram will help you.

Also, you are thinking about plot histogram using seaborn distplot because matplotlib plt.hist() work for the same. right?

In [25]:
#Plot Histogram of "size", Taken "size" from above dataset
sns.distplot(tips_df["size"])
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cfffee9d08>
In [26]:
#Plot Histogram of "tip"
sns.distplot(tips_df["tip"])
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80560588>

How to modify the seaborn histogram?

Seaborn distplot function has a bunch of parameters, which help to decorate sns histogram.

Syntax: sns.distplot( a, bins=None, hist=True, kde=True, rug=False, fit=None, hist_kws=None, kde_kws=None, rug_kws=None, fit_kws=None, color=None, vertical=False, norm_hist=False, axlabel=None, label=None, ax=None, )

  • a: Pass numeric type data as a Series, 1d-array, or list to plot histogram. Examples showed above.

  • bins: If, the dataset contains data from range 1 to 55 and your requirement to show data step of 5 in each bar.

In [27]:
#Plot Histogram of "total_bill" with bins parameters
sns.distplot(tips_df["total_bill"], bins=55)
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cffc1e9648>
In [28]:
# hist: If, you don’t need histogram then pass bool “True” value otherwise “False“.
#Plot Histogram of "total_bill" with hist parameters
sns.distplot(tips_df["total_bill"], hist = False)
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf806f4ec8>
In [29]:
# kde: ked stands for “kernel density estimate” to show it pass bool value “True” or “False“.
#Plot Histogram of "total_bill" with kde (kernal density estimator) parameters
sns.distplot(tips_df["total_bill"], kde=False,)
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80768f48>
In [30]:
#Plot Histogram of "total_bill" with axlabel parameters
sns.distplot(tips_df["total_bill"],axlabel="Total Bill",)
Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80793508>
In [31]:
# label: Give a label to the sns histogram. It doesn’t work without matplotlib.pytplot’s plt.legend() function.
#Plot Histogram of "total_bill" with label parameters
sns.distplot(tips_df["total_bill"],label="Total Bill",)
 
plt.title("Histogram of Total Bill") # for histogram title
plt.legend() # for label
Out[31]:
<matplotlib.legend.Legend at 0x1cf8058cf88>
In [34]:
# fit: Fit the normalize, pass value norm and kde value “False” along with that import (from scipy.stats import norm).
#Plot Histogram of "total_bill" with fit and kde parameters
sns.distplot(tips_df["total_bill"],fit=norm, kde = False) # for fit (prm) -  from scipi.stats import norm
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80039fc8>
In [35]:
# Best way to plot a seaborn histogram


#Plot histogram in best format
plt.figure(figsize=(16,9))
sns.set() # for style
 
bins = [1,5,10,15,20,25,30,35,40,45,50,55]
sns.distplot(tips_df["total_bill"],bins=bins,
            hist_kws = {'color':'#DC143C', 'edgecolor':'#aaff00',
                       'linewidth':5, 'linestyle':'--', 'alpha':0.9},
             
            kde=False,
            fit = norm,
            fit_kws = {'color':'#8e00ce', 
                       'linewidth':12, 'linestyle':'--', 'alpha':0.4},
            rug = True,
            rug_kws = {'color':'#0426d0', 'edgecolor':'#00dbff',
                       'linewidth':3, 'linestyle':'--', 'alpha':0.9},
            label = "TB")
 
plt.xticks(bins)
plt.title("Histogram of Restorant Total Bill", fontsize = 20)
plt.xlabel("Total Bill", fontsize = 15)
plt.legend()
plt.show()
In [36]:
# Plot multiple seaborn histogram in single graph
plt.figure(figsize=(16,9))
sns.set() # for style
sns.distplot(tips_df["total_bill"], bins=9, label="total_bil")
sns.distplot(tips_df["tip"], bins=9, label="tip")
sns.distplot(tips_df["size"], bins=9, label = "size")
 
plt.legend()
Out[36]:
<matplotlib.legend.Legend at 0x1cfffd80988>

3. Seaborn Barplot – Sns.Barplot()

If you have x and y variable dataset and want to find a relationship between them using bar graph then seaborn barplot will help you. The seaborn sns.barplot() function draws barplot conveniently.

Bar graph or Bar Plot: Bar Plot is a visualization of x and y numeric and categorical dataset variable in a graph to find the relationship between the

Syntax: sns.barplot( x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=<function mean at 0x0000026F155D02F0>, ci=95, n_boot=1000, units=None, orient=None, color=None, palette=None, saturation=0.75, errcolor=’.26′, errwidth=None, capsize=None, dodge=True, ax=None, kwargs, )**

In [37]:
# Plot barplot
sns.barplot()
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cffda14348>
In [38]:
# sns.barplot() x, y parameters
# Plot tips_df.day &amp; tips_df.total_bill barplot
sns.barplot(x = tips_df.day, y = tips_df.total_bill)
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80421d48>
In [39]:
# Pass value as DataFrame, array, or list of arrays, optional
# Pass dataset using data parameter
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
           data = tips_df)
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80167748>
In [42]:
# Above days is not in order, means, x data is not in order
# Let's make it in order form
# modify the order of day
order = ['Sun', 'Thur', 'Fri', 'Sat']

sns.barplot(x = 'day', y = 'total_bill', hue = 'sex', 
            data = tips_df, order=order)
Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80218208>
In [43]:
# If you want to arrange the graph in order according to male and female
#Modify hue order
hue_order = ['Female', 'Male']
 
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
           data = tips_df, hue_order = hue_order)
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf80295e88>
In [45]:
# sns.barplot() estimator parameter
# It accepts NumPy statistical function like mean, median, max, min to estimate within each categorical bin.
# estimate y variable value and then plot
# In a simple way, you want to set ymax by statistical function then use it.
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
           data = tips_df, estimator= np.max)
Out[45]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf803238c8>
In [46]:
# sns.barplot() kwargs parameter
# help to give an artistic look to the bar graph.I recommend you use more for an artistic look.
# Keyword Arguments parameter
kwargs = {'alpha':0.9, 'linestyle':':', 'linewidth':5, 'edgecolor':'k'}
 
sns.barplot(x = 'day', y = 'total_bill', hue = 'sex',
           data = tips_df,**kwargs)
Out[46]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf803a97c8>

Example of Seaborn Barplot

Till now, we used all barplot parameter and its time to use them together because to show it the professional way. In bellow, barplot example used some other functions like:

  • sns.set – for background dark grid style
  • plt.figure() – for figure size
  • plt.title() – for barplot title
  • plt.xlabel() – for x-axis label
  • plt.ylabel() – for y-axis label
  • plt.savefig() – for save figure
  • plt.show() – for show image only
In [47]:
# Example of Seaborn Barplot
sns.set()
plt.figure(figsize = (16,9))
 
sns.barplot(x = 'day', y = 'total_bill', 
           data = tips_df, alpha =1, linestyle = "-.", linewidth = 3,
           edgecolor = "k")
 
plt.title("Barplot of Days and Total Bill", fontsize = 20)
plt.xlabel("Days", fontsize = 15)
plt.ylabel("Total Bill", fontsize = 15)
 
plt.savefig("Barplot of Days and Total Bill")
plt.show()

4. Seaborn Scatter Plot Using Sns.Scatterplot()

If, you have x and y numeric or one of them a categorical dataset. You want to find the relationship between x and y to getting insights. Then the seaborn scatter plot function sns.scatterplot() will help.

Along with sns.scatterplot() function, seaborn have multiple functions like sns.lmplot(), sns.relplot(), sns.pariplot(). But sns.scatterplot() is the best way to create sns scatter plot.

Syntax: sns.scatterplot( x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’, x_jitter=None, y_jitter=None, legend=’brief’, ax=None, kwargs, )**

In [49]:
# axlabel: Give a name to the x-axis
#Import dataset from GitHub Seborn Repository
titanic_df = sns.load_dataset("titanic")
titanic_df 
Out[49]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True

891 rows × 15 columns

In [50]:
# Method 1:
# Draw Seaborn Scatter Plot to find relationship between age and fare
sns.scatterplot(x = "age", y = "fare", data = titanic_df)
Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf8032c188>
In [51]:
# Method 2:
# Draw Seaborn Scatter Plot to find relationship between age and fare
sns.scatterplot(x = titanic_df.age, y = titanic_df.fare)
Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cfffe53808>
In [52]:
# Method 3:
# Draw Seaborn Scatter Plot to find relationship between age and fare
sns.scatterplot(x = titanic_df['age'], y = titanic_df['fare'])
Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cfffed1408>
In [53]:
# sns.scatterplot() hue parameter
# hue: Pass value as a name of variables or vector from DataFrame, optional

# scatter plot hue parameter
sns.scatterplot(x = "age", y = "fare", data = titanic_df, hue = "sex")
Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf818a19c8>
In [54]:
# Then hue_order parameter will help to change hue categorical data order.
# scatter plot hue_order parameter
sns.scatterplot(x = "age", y = "fare", data = titanic_df, hue = "sex",
               hue_order= ['female', 'male'])
Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cffc1e9348>
In [55]:
# sns.scatterplot() ax (Axes) parameter
# used ax.set() method to change the scatter plot x-axis, y-axis label, and title.
ax = sns.scatterplot(x = "age", y = "fare", data = titanic_df, )
 
ax.set(xlabel = "Age",
      ylabel = "Fare",
      title = "Seaborn Scatter Plot of Age and Fare")
Out[55]:
[Text(0, 0.5, 'Fare'),
 Text(0.5, 0, 'Age'),
 Text(0.5, 1.0, 'Seaborn Scatter Plot of Age and Fare')]

sns.scatterplot() kwargs (Keyword Arguments)

The seaborn sns.scatterplot() allow all kwargs of matplotlib plt.scatter() like:

  • edgecolor: Change the edge color of the scatter point. Pass value as a color code, name or hex code.
  • facecolor: Change the face (point) color of the scatter plot. Pass value as a color code, name or hex code.
  • linewidth: Change line width of scatter plot. Pass float or int value
  • linestyle: Change the line style of the scatter plot. Pass line style has given below in the table.
In [56]:
# scatter plot kwrgs (keyword arguments) 
plt.figure(figsize=(16,9)) # figure size in 16:9 ratio
 
kwargs  =   {'edgecolor':"r",
             'facecolor':"k",
             'linewidth':2.7,
             'linestyle':'--',
            }
 
sns.scatterplot(x = "age", y = "fare", data = titanic_df, size = "sex", sizes = (500, 1000), alpha = .7,  **kwargs)
Out[56]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf819fcb08>

5. Seaborn Heatmap Using Sns.Heatmap()

The sns is short name use for seaborn python library. The heatmap especially uses to show 2D (two dimensional ) data in graphical format.Each data value represents in a matrix and it has a special color.

Syntax: sns.heatmap( data, vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=None, fmt=’.2g’, annot_kws=None, linewidths=0, linecolor=’white’, cbar=True, cbar_kws=None, cbar_ax=None, square=False, xticklabels=’auto’, yticklabels=’auto’, mask=None, ax=None, kwargs, )**

In [57]:
# Let's create 2D array
array_2d = np.linspace(1,5,12).reshape(4,3) # create numpy 2D array
 
print(array_2d) # print numpy array
sns.heatmap(array_2d) # create heatmap
[[1.         1.36363636 1.72727273]
 [2.09090909 2.45454545 2.81818182]
 [3.18181818 3.54545455 3.90909091]
 [4.27272727 4.63636364 5.        ]]
Out[57]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf81a87848>

Seaborn heatmap using DataFrame

In [58]:
globalWarming_df = pd.read_csv("Who_is_responsible_for_global_warming.csv")
globalWarming_df.head()
Out[58]:
Country Name Country Code Indicator Name Indicator Code 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
0 United States USA CO2 emissions (metric tons per capita) EN.ATM.CO2E.PC 20.178751 19.636505 19.613404 19.564105 19.658371 19.591885 19.094067 19.217898 18.461764 17.157738 17.442862 16.976957 16.310471 16.323477 16.502837
1 United Kingdom GBR CO2 emissions (metric tons per capita) EN.ATM.CO2E.PC 9.199549 9.233175 8.904123 9.053278 8.989140 8.982939 8.898710 8.617164 8.424424 7.574622 7.857836 7.079298 7.355898 7.145844 6.497440
2 India IND CO2 emissions (metric tons per capita) EN.ATM.CO2E.PC 0.979870 0.971698 0.967381 0.992392 1.025028 1.068563 1.121982 1.193210 1.310098 1.431844 1.397009 1.476686 1.598099 1.591438 1.730000
3 China CHN CO2 emissions (metric tons per capita) EN.ATM.CO2E.PC 2.696862 2.742121 3.007083 3.524074 4.037991 4.523178 4.980314 5.334910 5.701915 6.010102 6.560520 7.241515 7.424751 7.557211 7.543908
4 Russian Federation RUS CO2 emissions (metric tons per capita) EN.ATM.CO2E.PC 10.627121 10.669603 10.715901 11.090647 11.120627 11.253529 11.669122 11.672457 12.014507 11.023856 11.694348 12.334881 12.784979 12.393556 11.857528
In [59]:
# set country name as index and drop Country Code, Indicator Name and Indicator Code
 
globalWarming_df  = globalWarming_df.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code'], axis=1).set_index('Country Name')
globalWarming_df
Out[59]:
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Country Name
United States 20.178751 19.636505 19.613404 19.564105 19.658371 19.591885 19.094067 19.217898 18.461764 17.157738 17.442862 16.976957 16.310471 16.323477 16.502837
United Kingdom 9.199549 9.233175 8.904123 9.053278 8.989140 8.982939 8.898710 8.617164 8.424424 7.574622 7.857836 7.079298 7.355898 7.145844 6.497440
India 0.979870 0.971698 0.967381 0.992392 1.025028 1.068563 1.121982 1.193210 1.310098 1.431844 1.397009 1.476686 1.598099 1.591438 1.730000
China 2.696862 2.742121 3.007083 3.524074 4.037991 4.523178 4.980314 5.334910 5.701915 6.010102 6.560520 7.241515 7.424751 7.557211 7.543908
Russian Federation 10.627121 10.669603 10.715901 11.090647 11.120627 11.253529 11.669122 11.672457 12.014507 11.023856 11.694348 12.334881 12.784979 12.393556 11.857528
Australia 17.200610 16.733367 17.370452 16.901959 17.026515 17.169711 17.651398 17.865260 18.160876 18.200182 17.740845 17.538878 17.072905 16.095833 15.388766
France 5.946665 6.153061 6.068664 6.115998 6.120079 6.099599 5.906266 5.766385 5.690501 5.438357 5.428981 5.077911 5.075064 5.062174 4.573182
Germany 10.095640 10.366287 10.058673 9.969355 9.898682 9.666372 9.911476 9.488040 9.506321 8.818596 9.279634 9.124859 9.199300 9.390623 8.889370
Canada 17.367115 16.985030 16.559378 17.461199 17.258911 17.251083 16.696694 16.855883 16.875198 15.961560 15.723167 15.639760 14.890636 14.711972 15.117159
Brazil 1.871118 1.898354 1.844380 1.762482 1.828672 1.858088 1.839394 1.901372 2.008670 1.883812 2.132938 2.211587 2.343570 2.488417 2.594388
Argentina 3.835574 3.568600 3.291548 3.525584 4.069058 4.141237 4.434821 4.382669 4.682912 4.410890 4.558500 4.600291 4.569384 4.462904 4.746797
Pakistan 0.768458 0.764702 0.788668 0.804959 0.872802 0.887768 0.929857 0.991030 0.972050 0.950832 0.946268 0.929801 0.918978 0.904316 0.896264
Nepal 0.129282 0.135226 0.106877 0.113902 0.105477 0.120277 0.098812 0.099736 0.129224 0.162087 0.187128 0.202491 0.211798 0.237170 0.283539
Bangladesh 0.211802 0.242020 0.246756 0.256602 0.266823 0.275247 0.299529 0.301631 0.332728 0.357159 0.393937 0.412011 0.433488 0.442401 0.459142
Japan 9.622352 9.464309 9.573130 9.725282 9.909203 9.698883 9.632049 9.782964 9.449534 8.620816 9.148316 9.317427 9.638628 9.780815 9.538706
In [60]:
# Create heatmap
 
plt.figure(figsize=(16,9))
 
sns.heatmap(globalWarming_df)
Out[60]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf81d23d88>
In [61]:
# change heatmap color using cmap
 
plt.figure(figsize=(16,9))
 
sns.heatmap(globalWarming_df, cmap="coolwarm")
Out[61]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf81da6f88>
In [62]:
# If you want to see the value in graph
# annot (annotate) parameter
 
plt.figure(figsize=(16,9))
 
sns.heatmap(globalWarming_df, annot = True)
Out[62]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf81e4a048>

How to change style & format of annot (annotate) using sns.heatmap() annot_kws?

  • fontsize: To change the size of the font
  • fontstyle: To change the style of font like italic, oblique and normal
  • fontfamily: To change the family of font like serif, cursive
  • color: To change the color of font
  • alpha: To change the transparency of the text
  • rotation: To change the rotation of the text
  • verticalalignment: To change the vertical alignment of the text like center, top, bottom, baseline, center_baseline
  • backgroundcolor: To change the background color of the text
In [65]:
# annot_kws parameter
# Linewidth  will create lines between graph
 
plt.figure(figsize=(16,9))
 
annot_kws={'fontsize':10, 
           'fontstyle':'italic',  
           'color':"k",
           'alpha':1.0, 
           'rotation':"vertical",
           'verticalalignment':'center',
           'backgroundcolor':'w'}
 
sns.heatmap(globalWarming_df, annot = True, annot_kws= annot_kws, linewidths=4,linecolor="k")
Out[65]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf828f6388>
In [72]:
 
plt.figure(figsize=(20,15))
 
annot_kws={'fontsize':10, 
           'fontstyle':'italic',  
           'color':"k",
           'alpha':1.0, 
           'rotation':"vertical",
           'verticalalignment':'center',
           'backgroundcolor':'w'}
 
ax = sns.heatmap(globalWarming_df, annot = True, annot_kws= annot_kws, linewidths=4,linecolor="k")
# set seaborn heatmap title, x-axis, y-axis label and font size
ax.set(title="Heatmap", xlabel="Years", ylabel="Country Name",)
 
sns.set(font_scale=2) # set fontsize 2

How to create a seaborn heatmap using correlation matrix?

The main goal of python heatmap is to show the correlation matrix by data visualizing. When you want to find what’s the relationship between multiple features and which features are best for Machine Learning model building. Then take correlation of that dataset and visualize by sns heatmap.

A coreelations is a statistical measure of the relationship between two variable (x,y)

A correlaton coeffiecient is value from -1 to 1.

  • -1: Perfect negative correlations(Ex. X-increase then Y- decreases).
  • 0: No correlations(Ex. X-increases no affect on Y and vice versa)
  • 1: Perfect positive correlation(Ex. X-increase then Y-increases)
In [73]:
# sns heatmap correlation
 
plt.figure(figsize=(16,9))
 
sns.heatmap(globalWarming_df.corr(), annot = True)
Out[73]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf82e9f3c8>
In [74]:
# Upper triangle seaborn heatmap with mask
plt.figure(figsize=(16,9))
 
corr_mx = globalWarming_df.corr() # correlation matrix
 
matrix = np.tril(corr_mx) # take lower correlation matrix
#We used numpy ‘.tril()’ method to take the upper correlation matrix and mask attribute.
 
sns.heatmap(corr_mx, mask=matrix)
Out[74]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf859760c8>
In [75]:
# Lower triangle heatmap
 
plt.figure(figsize=(16,9))
 
corr_mx = globalWarming_df.corr() # correlation matrix
 
matrix = np.triu(corr_mx) # take upper correlation matrix
 
sns.heatmap(corr_mx, mask=matrix)
Out[75]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cf84d47348>

Simple HeatMap Example

In [76]:
# import libraries
import seaborn as sns # for data visualization
import matplotlib.pyplot as plt # for data visualization
import pandas as pd # for data analysis
 
# load dataset and create DataFrame ready to create heatmap
flights = sns.load_dataset("flights")
flights_df = flights.pivot("month", "year", "passengers")
 
# set heatmap size
plt.figure(figsize= (16,9)) 
 
# create heatmap seaborn
 
cbar_kws = {"shrink":.8,
           'extend':'max',
           'extendfrac':.2, 
           "drawedges":True} 
 
sns.heatmap(flights_df.corr(), cmap="inferno", annot = True, linewidth = 2, cbar_kws=cbar_kws)
 
plt.title("Heatmap Correlation of 'Flights' Dataset", fontsize = 25)
plt.xlabel("Years", fontsize = 20)
plt.ylabel("Months", fontsize = 20)
plt.show()

Seaborn Pairplot using sns.pairplot()

Seaborn Pairplot uses to get the relation between each and every variable present in Pandas DataFrame. It works like a seaborn scatter plot but it plot only two variables plot and sns paiplot plot the pairwise plot of multiple features/variable in a grid format.

Syntax: sns.pairplot( data, hue=None, hue_order=None, palette=None, vars=None, x_vars=None, y_vars=None, kind=’scatter’, diag_kind=’auto’, markers=None, height=2.5, aspect=1, dropna=True, plot_kws=None, diag_kws=None, grid_kws=None, size=None, )

In [77]:
# Let's load the another dataset
from sklearn.datasets import load_breast_cancer
cancer_dataset = load_breast_cancer()
# create datafrmae
cancer_df = pd.DataFrame(np.c_[cancer_dataset['data'],cancer_dataset['target']],
             columns = np.append(cancer_dataset['feature_names'], ['target']))
cancer_df.head(6)
Out[77]:
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension target
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 ... 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890 0.0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 ... 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902 0.0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 ... 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758 0.0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 ... 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300 0.0
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 ... 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678 0.0
5 12.45 15.70 82.57 477.1 0.12780 0.17000 0.1578 0.08089 0.2087 0.07613 ... 23.75 103.40 741.6 0.1791 0.5249 0.5355 0.1741 0.3985 0.12440 0.0

6 rows × 31 columns

In [78]:
#plot seaborn pairplot
sns.pairplot(cancer_df)
Out[78]:
<seaborn.axisgrid.PairGrid at 0x1cf8664e9c8>
In [80]:
# vars: It hep to plot pairplot accounting to required features/variable.
sns.pairplot(cancer_df, vars=['mean radius', 'mean texture','mean perimeter', 'mean area', 'mean smoothness'])
Out[80]:
<seaborn.axisgrid.PairGrid at 0x1cfaf5e1b48>
In [81]:
# hue: Map the third feature to get more insights. Pass string (variable name), optional
sns.pairplot(cancer_df, vars = ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
        'mean smoothness'], hue ='target')
Out[81]:
<seaborn.axisgrid.PairGrid at 0x1cfb0088848>
In [82]:
# hue_order: To change the order of hue. Pass list of strings
sns.pairplot(cancer_df, vars = ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
        'mean smoothness'], hue ='target', hue_order = [1.0, 0.0])
Out[82]:
<seaborn.axisgrid.PairGrid at 0x1cfaff10908>
In [83]:
# x_vars, y_vars: If you want required features on the x-axis and the y-axis the use it.
sns.pairplot(cancer_df, hue ='target', x_vars = ['mean radius', 'mean texture'], y_vars =['mean radius'])
Out[83]:
<seaborn.axisgrid.PairGrid at 0x1cfb301c848>
In [84]:
# If we want to see the type of algorithm, we can mentione in KiND
# kind: To find the linearity. Pass {‘scatter’, ‘reg’}, optional
sns.pairplot(cancer_df, vars = ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
        'mean smoothness'], hue ='target', kind = 'reg')
Out[84]:
<seaborn.axisgrid.PairGrid at 0x1cfb3468fc8>
In [ ]: